EXSI 磁盘健康状态检查

1 查看所有存储设备

esxcli storage core device list

2 查看HDD 硬盘

2.1 查看具体某个HDD硬盘

esxcli storage core device smart get -d t10.ATA_____HGST_HDN726040ALE614____________________K3H1RKTD____________
esxcli storage core device smart get -d t10.ATA_____HGST_HDN726040ALE614____________________K3H1RKTD____________
 esxcli storage core device smart get -d t10.ATA_____WDC_WD80EFAX2D68KNBN0____________________VAGJSVBL____________

2.2 SMART 指标含义读取

SMART 参数 含义 是否关键 正常值
5 - Reallocated Sectors Count 重新分配的坏扇区数(表示磁盘上有损坏的扇区被替换) 🔴 关键 = 0(任何增加都表示磁盘正在恶化)
187 - Reported Uncorrectable Errors 无法纠正的错误(数据读取失败次数) 🔴 关键 = 0(大于 0 可能有问题)
188 - Command Timeout 命令超时次数(可能是磁盘故障或电源问题) 🟠 警告 = 0(大于 0 可能有问题)
197 - Current Pending Sector Count 当前待重映射扇区数(磁盘检测到的即将坏掉的扇区) 🔴 关键 = 0(大于 0 说明磁盘快挂了)
198 - Offline Uncorrectable Sector Count 无法恢复的坏扇区数 🔴 关键 = 0(出现说明磁盘有物理损坏)
199 - UDMA CRC Error Count 数据传输错误次数(可能是 SATA 线或磁盘问题) 🟠 警告 = 0(偶尔出现可能是数据线问题)
9 - Power-On Hours (POH) 磁盘通电时间(运行了多少小时) 🟡 参考 < 30,000 小时(>3.5 年需要注意)
194 - Temperature (Celsius) 磁盘温度 🟡 参考 30~45°C(超过 50°C 需要降温)
1 - Read Error Rate 读取错误率(部分厂商不公开详细数值) 🟡 参考 数值不应持续增长

3 查看 NVME固态 硬盘

3.1 列出所有NVME 硬盘

esxcli nvme device list

查看固态的 smart 信息

-A 后面跟每一块固态的名字, Percentage Used: 97 % 这个值为磁盘的耐久性使用情况,超过100 就要更换了。

esxcli nvme device log smart get -A vmhba4
esxcli nvme device log smart get -A vmhba5
SMART And Health Info:
   Available Spare Space Below Threshold: false
   Temperature Warning: false
   NVM Subsystem Reliability Degradation: false
   Read Only Mode: false
   Volatile Memory Backup Device Failure: false
   Composite Temperature: 320 K
   Available Spare: 100 %
   Available Spare Threshold: 10 %
   Percentage Used: 26 %
   Data Units Read: 0x6b463c9
   Data Units Written: 0x7dd173b
   Host Read Commands: 0x37792882
   Host Write Commands: 0x33d682e1
   Controller Busy Time: 0xea8
   Power Cycles: 0x434
   Power On Hours: 0x16ab
   Unsafe Shutdowns: 0x67
   Media Errors: 0x0
   Number of Error Info Log Entries: 0x4d
   Warning Composite Temperature Time: 0 Mins
   Critical Composite Temperature Time: 0 Mins
   Temperature Sensor 1: 320 K
   Temperature Sensor 2: 0 K
   Temperature Sensor 3: 0 K
   Temperature Sensor 4: 0 K
   Temperature Sensor 5: 0 K
   Temperature Sensor 6: 0 K
   Temperature Sensor 7: 0 K
   Temperature Sensor 8: 0 K

3.2 列出对应的NVME 固态硬盘

esxcli nvme device namespace list -A vmhba5